03:39
2026-05-23
dev.to
large-language-models
BeeLlama v0.2.0: 164 tok/s on a 27B model, one RTX 3090
BeeLlama v0.2.0 demonstrates that speculative decoding can achieve a 4.4x to 4.93x throughput multiplier on a single RTX 3090, running 27B and 31B parameter models at 37-36 tokens per second baseline ā¦